{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "lnXme3b4ks6v",
    "papermill": {
     "duration": 0.012399,
     "end_time": "2021-04-05T17:55:52.635137",
     "exception": false,
     "start_time": "2021-04-05T17:55:52.622738",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# Dosearch for Causal Identification in DAGs.\n",
    "\n",
    "\n",
    "This a simple notebook for teaching that illustrates capabilites of the \"dosearch\" package, which is a great tool.\n",
    "\n",
    "NB. In my experience, the commands are sensitive to syntax ( e.g. spacing when -> are used), so be careful when changing to other examples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "vYsOte6Aks6z",
    "outputId": "fe46224e-cf6c-45e2-f2d4-592fd6a0d99a"
   },
   "outputs": [],
   "source": [
    "%pip install rpy2\n",
    "%load_ext rpy2.ipython\n",
    "# import subprocess\n",
    "# subprocess.run('conda install -c conda-forge r-base', shell=True)\n",
    "# !pip install rpy2\n",
    "import numpy as np\n",
    "import statsmodels.formula.api as smf\n",
    "import rpy2\n",
    "from rpy2.robjects import r, pandas2ri\n",
    "pandas2ri.activate()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_cell_guid": "b1076dfc-b9ad-4769-8c92-a6c4dae69d19",
    "_uuid": "8f2839f25d086af736a60e9eeb907d3b93b6e0e5",
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "8QMOJZrpks62",
    "outputId": "65cdb82c-1a42-4261-b29c-42abf5c1f4b5",
    "papermill": {
     "duration": 61.517411,
     "end_time": "2021-04-05T17:56:54.163802",
     "exception": false,
     "start_time": "2021-04-05T17:55:52.646391",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "install.packages(\"dosearch\", \".\")\n",
    "library(\"dosearch\", lib.loc=\".\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "XSX2CIz-ks62",
    "papermill": {
     "duration": 0.012155,
     "end_time": "2021-04-05T17:56:54.189437",
     "exception": false,
     "start_time": "2021-04-05T17:56:54.177282",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "We start with the simplest graph, with the simplest example\n",
    "where $D$ is policy, $Y$ is outcomes, $X$ is a confounder:\n",
    "$$\n",
    "D\\to Y, \\quad X \\to (D,Y)\n",
    "$$\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "6bzYI4kAks63",
    "papermill": {
     "duration": 0.013694,
     "end_time": "2021-04-05T17:56:54.215310",
     "exception": false,
     "start_time": "2021-04-05T17:56:54.201616",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "Now suppose we want conditional average policy effect."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "cWDRUdSSks63",
    "outputId": "18c017c5-9c0f-4cf1-b900-95dd444a2d60",
    "papermill": {
     "duration": 0.202016,
     "end_time": "2021-04-05T17:56:54.429124",
     "exception": false,
     "start_time": "2021-04-05T17:56:54.227108",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "graph <- \"x -> y\n",
    "  x -> d\n",
    "  d -> y\"\n",
    "dosearch(\"p(y,d,x)\", \"p(y | do(d),x)\", graph)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "hgBVCQ1lks63",
    "papermill": {
     "duration": 0.012715,
     "end_time": "2021-04-05T17:56:54.454351",
     "exception": false,
     "start_time": "2021-04-05T17:56:54.441636",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "This recovers the correct identification formula for law of the counterfactual $Y(d)$ induced by $do(D=d)$:\n",
    "$$\n",
    "p_{Y(d)|X}(y|x) := p(y|do(d),x) = p(y|d,x).\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "c1vRLjKYks63",
    "outputId": "ac3f4bb5-79a2-4f35-9e01-980db04863bc",
    "papermill": {
     "duration": 0.043133,
     "end_time": "2021-04-05T17:56:54.510274",
     "exception": false,
     "start_time": "2021-04-05T17:56:54.467141",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "dosearch(\"p(y,d,x)\", \"p(y | do(d))\", graph)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zgUxN4drks64",
    "papermill": {
     "duration": 0.01273,
     "end_time": "2021-04-05T17:56:54.535701",
     "exception": false,
     "start_time": "2021-04-05T17:56:54.522971",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "This recovers the correct identification formula:\n",
    "$$\n",
    "p_{Y(d)}(y) := p(y: do(d)) = \\sum_{x}\\left(p(x)p(y|d,x)\\right)\n",
    "$$\n",
    "We integrate out $x$ in the previous formula.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "OQV-FVU7ks64",
    "papermill": {
     "duration": 0.013627,
     "end_time": "2021-04-05T17:56:54.562271",
     "exception": false,
     "start_time": "2021-04-05T17:56:54.548644",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "Suppose we don't observe the confounder. The effect is generally not identified.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "tnZRBXegks64",
    "outputId": "5715de4c-c0c2-490b-e195-10396c89eeb4",
    "papermill": {
     "duration": 0.046266,
     "end_time": "2021-04-05T17:56:54.621562",
     "exception": false,
     "start_time": "2021-04-05T17:56:54.575296",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "data <- \"p(y,d)\"\n",
    "query <- \"p(y | do(d))\"\n",
    "graph <- \"x -> y\n",
    "  x -> d\n",
    "  d -> y\"\n",
    "dosearch(data, query, graph)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "9wWIX032ks65",
    "papermill": {
     "duration": 0.013931,
     "end_time": "2021-04-05T17:56:54.649609",
     "exception": false,
     "start_time": "2021-04-05T17:56:54.635678",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "The next graph is an example of J. Pearl (different notation), where the graph is considerably more complicated. See Pearl's Example in the text - e.g. Figure 7.14. We are interested in $D \\to Y$.\n",
    "\n",
    "Here we try conditioning on $X_2$. This would block one backdoor path from $D$ to $Y$, but would open another path on which $X_2$ is a collider, so this shouldn't work. The application below gave a correct answer (after I put the spacings carefully).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "Cs2Y8jtdks65",
    "outputId": "96126887-4eef-47e5-ed1d-aa8e3bf04166",
    "papermill": {
     "duration": 0.136442,
     "end_time": "2021-04-05T17:56:54.800269",
     "exception": false,
     "start_time": "2021-04-05T17:56:54.663827",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "graph<- \"z1 -> x1\n",
    "z1 -> x2\n",
    "z2 -> x2\n",
    "z2 -> x3\n",
    "x2 -> d\n",
    "x2 -> y\n",
    "x3 -> y\n",
    "x1 -> d\n",
    "d -> m\n",
    "m -> y\n",
    "\"\n",
    "dosearch(\"p(y,d,x2)\", \"p(y|do(d))\", graph)  # observed only (Y, D, X_2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "QStKCsfLks65",
    "papermill": {
     "duration": 0.013831,
     "end_time": "2021-04-05T17:56:54.828405",
     "exception": false,
     "start_time": "2021-04-05T17:56:54.814574",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "Intuitively, we should add more common causes. For example, adding $X_3$ and using $S = (X_2, X_3)$ should work."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "RBUM1lglks65",
    "outputId": "aae85beb-ad30-4a91-cf78-f23f9d840db9",
    "papermill": {
     "duration": 0.239136,
     "end_time": "2021-04-05T17:56:55.081723",
     "exception": false,
     "start_time": "2021-04-05T17:56:54.842587",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "print(dosearch(\"p(y,d,x2,x3)\", \"p(y|do(d),x2, x3)\", graph))  # can ID conditional average effect?\n",
    "print(dosearch(\"p(y,d,x2,x3)\", \"p(y|do(d))\", graph))  # can ID unconditional effect?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4hDBOrfMks65",
    "papermill": {
     "duration": 0.014749,
     "end_time": "2021-04-05T17:56:55.112369",
     "exception": false,
     "start_time": "2021-04-05T17:56:55.097620",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "This retrieves the correct formulas for counterfactual distributions of $Y(d)$ induced by $Do(D=d)$:\n",
    "\n",
    "The conditional distribution is identified by\n",
    "$$\n",
    "p_{Y(d) \\mid X_2, X_3}(y) := p(y |x_2, x_3: do(d)) = p(y|x_2,x_3,d).\n",
    "$$\n",
    "\n",
    "The unconditional distribution is obtained by integrating out $x_2$ and $x_3$:\n",
    "\n",
    "$$\n",
    "p_{Y(d) }(y) :=  p(y do(d)) = \\sum_{x2,x3}\\left(p(x_2,x_3)p(y|x_2,x_3,d)\\right).\n",
    "$$\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HwtImk2mks66",
    "papermill": {
     "duration": 0.014567,
     "end_time": "2021-04-05T17:56:55.141601",
     "exception": false,
     "start_time": "2021-04-05T17:56:55.127034",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "Next we suppose that we observe only $(Y,D, M)$. Can we identify the effect $D \\to Y$?  Can we use back-door-criterion twice to get $D \\to M$ and $M \\to Y$ affect? Yes, that's called front-door criterion -- so we just need to remember only the back-door and the fact that we can use it iteratively."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "NoDq-tmMks66",
    "outputId": "06118c63-004b-4aa8-e3d2-01f28255ea53",
    "papermill": {
     "duration": 0.281619,
     "end_time": "2021-04-05T17:56:55.438852",
     "exception": false,
     "start_time": "2021-04-05T17:56:55.157233",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "print(dosearch(\"p(y, d, m)\", \"p(m|do(d))\", graph))\n",
    "print(dosearch(\"p(y, d, m)\", \"p(y|do(m))\", graph))\n",
    "print(dosearch(\"p(y, d, m)\", \"p(y|do(d))\", graph))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ml5ei_Ftks66",
    "papermill": {
     "duration": 0.016414,
     "end_time": "2021-04-05T17:56:55.472018",
     "exception": false,
     "start_time": "2021-04-05T17:56:55.455604",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "So we get identification results:\n",
    "First,\n",
    "$$\n",
    "p_{M(d)}(m)  := p(m|do(d)) = p(m|d).\n",
    "$$\n",
    "Second,\n",
    "$$\n",
    "p_{Y(m)}(y) := p(y|do(m)) = \\sum_{d}\\left(p(d)p(y|d,m)\\right),\n",
    "$$\n",
    "and the last by integrating the product of these two formulas:\n",
    "$$\n",
    "p_{Y(d)}(y) := p(y|do(d)) = \\sum_{m}\\left(p(m|d)\\sum_{d}\\left(p(d)p(y|d,m)\\right)\\right)\n",
    "$$\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "W5woWqUfks66",
    "papermill": {
     "duration": 0.015846,
     "end_time": "2021-04-05T17:56:55.503891",
     "exception": false,
     "start_time": "2021-04-05T17:56:55.488045",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "The package is very rich and allows identification analysis, when the data comes from multiple sources. Suppose we observe marginal distributions $(Y,D)$  and $(D,M)$ only. Can we identify the effect of $D \\to Y$. The answer is (guess) and the package correctly recovers it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "RtyCW6pmks66",
    "outputId": "4e8d65d6-fe87-418b-c5ac-1fba5650b4c4",
    "papermill": {
     "duration": 0.592959,
     "end_time": "2021-04-05T17:56:56.112631",
     "exception": false,
     "start_time": "2021-04-05T17:56:55.519672",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%R\n",
    "graph<- \"z1 -> x1\n",
    "z1 -> x2\n",
    "z2 -> x2\n",
    "z2 -> x3\n",
    "x2 -> d\n",
    "x2 -> y\n",
    "x3 -> y\n",
    "x1 -> d\n",
    "d -> m\n",
    "m -> y\n",
    "\"\n",
    "print(dosearch(\"p(y,m)\\np(m,d)\", \"p(m|do(d))\" , graph))\n",
    "print(dosearch(\"p(y,m)\\np(m,d)\", \"p(y|do(m))\", graph))\n",
    "print(dosearch(\"p(y,m)\\np(m,d)\", \"p(y|do(d))\", graph))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "65hu5mmBks67"
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  },
  "papermill": {
   "default_parameters": {},
   "duration": 68.035072,
   "end_time": "2021-04-05T17:56:57.154022",
   "environment_variables": {},
   "exception": null,
   "input_path": "__notebook__.ipynb",
   "output_path": "__notebook__.ipynb",
   "parameters": {},
   "start_time": "2021-04-05T17:55:49.118950",
   "version": "2.2.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
